The incredible diversity of B and T lymphocyte receptors is the key to the adaptive immune system [1]. To leverage high-throughput sequencing for the characterization of T cell receptor repertoire, total RNA is extracted from the isolated T lymphocytes and reverse transcribed to cDNA. The complete 5' end of TCR genes containing the variable regions are amplified by oligonucleotide primers against constant TCR constant regions (Figure 1.1). The resulted fragments were used for next-generation sequencing library preparation and subjected to high-throughput sequencing.
Figure 1.1 Immuno-Profiling workflow
The raw sequencing data in fastq format were subjected to quality filtering [2]. Sequences that passed the quality filtering were mapped against IMGT database to find the best germline V(D)J gene matches [3]. CDR sequences were then further characterized and analyzed.
Figure 2.1 Bioinformatics analysis workflow
For TCR analysis, only reads that cover CDR3 regions were analyzed. Raw fastq files were first subject to quality assessment (Figure 3.1) [4]. Bases with poor quality scores (Q<20) were removed using Trimmomatic (v0.30). Trimmed data were also subject to quality assessment (Figure 3.2) [4]. Statistics of data processing is summarized in Table 3.1 (raw reads) and Table 3.2 (clean reads).
Figure 3.1 Sequence quality across all bases on raw reads. Y axis: Phred Quality Scores.
Figure 3.2 Sequence quality across all bases on trimmed reads. Y axis: Phred Quality Scores.
Table 3.1 Raw sequencing data quality statistics
Sample | Total_read | Q20_read | Q20(%) | Q30_read | Q30(%) |
---|---|---|---|---|---|
GQ01-E120-TRA | 25,000 | 24,984 | 99.94 | 20,969 | 83.88 |
GQ01-E120-TRB | 25,000 | 24,976 | 99.90 | 20,574 | 82.30 |
GQ02-E120-TRA | 25,000 | 24,983 | 99.93 | 21,462 | 85.85 |
GQ02-E120-TRB | 25,000 | 24,964 | 99.86 | 20,916 | 83.66 |
GQ03-E120-TRA | 25,000 | 24,989 | 99.96 | 20,956 | 83.82 |
GQ03-E120-TRB | 25,000 | 24,979 | 99.92 | 21,279 | 85.12 |
GQ04-E120-TRA | 25,000 | 24,980 | 99.92 | 21,627 | 86.51 |
GQ04-E120-TRB | 27,500 | 27,478 | 99.92 | 23,572 | 85.72 |
GQ05-E120-TRA | 2,500 | 2,498 | 99.92 | 2,105 | 84.20 |
GQ05-E120-TRB | 27,500 | 27,473 | 99.90 | 23,446 | 85.26 |
GQ06-E120-TRA | 25,000 | 24,976 | 99.90 | 21,348 | 85.39 |
GQ06-E120-TRB | 27,500 | 27,468 | 99.88 | 23,390 | 85.05 |
GQ07-E120-TRA | 27,500 | 27,486 | 99.95 | 23,443 | 85.25 |
GQ07-E120-TRB | 2,500 | 2,498 | 99.92 | 1,988 | 79.52 |
GQ08-E120-TRA | 27,500 | 27,478 | 99.92 | 23,370 | 84.98 |
GQ08-E120-TRB | 25,000 | 24,975 | 99.90 | 20,862 | 83.45 |
Table 3.2 Trimmed sequencing data quality statistics
Sample | Total_read | Q20_read | Q20(%) | Q30_read | Q30(%) |
---|---|---|---|---|---|
GQ01-E120-TRA | 22,811 | 22,811 | 100.00 | 22,811 | 100.00 |
GQ01-E120-TRB | 22,869 | 22,869 | 100.00 | 22,868 | 100.00 |
GQ02-E120-TRA | 23,040 | 23,040 | 100.00 | 23,039 | 100.00 |
GQ02-E120-TRB | 22,657 | 22,657 | 100.00 | 22,657 | 100.00 |
GQ03-E120-TRA | 21,586 | 21,586 | 100.00 | 21,586 | 100.00 |
GQ03-E120-TRB | 21,996 | 21,996 | 100.00 | 21,996 | 100.00 |
GQ04-E120-TRA | 21,879 | 21,879 | 100.00 | 21,879 | 100.00 |
GQ04-E120-TRB | 23,979 | 23,979 | 100.00 | 23,979 | 100.00 |
GQ05-E120-TRA | 2,298 | 2,298 | 100.00 | 2,298 | 100.00 |
GQ05-E120-TRB | 24,263 | 24,263 | 100.00 | 24,263 | 100.00 |
GQ06-E120-TRA | 22,917 | 22,917 | 100.00 | 22,917 | 100.00 |
GQ06-E120-TRB | 23,798 | 23,798 | 100.00 | 23,798 | 100.00 |
GQ07-E120-TRA | 24,035 | 24,035 | 100.00 | 24,035 | 100.00 |
GQ07-E120-TRB | 2,272 | 2,272 | 100.00 | 2,272 | 100.00 |
GQ08-E120-TRA | 23,719 | 23,719 | 100.00 | 23,718 | 100.00 |
GQ08-E120-TRB | 22,623 | 22,623 | 100.00 | 22,623 | 100.00 |
The assembled reads were blasted against IMGT database to identify the best match of germline V(D)J genes [5]. Since TCR CDR3 region contains information of all V, D and J gene usage, this report focuses on CDR3 analysis. The alignment results are shown in Tables 4.1.1 and Table 4.1.2. The complete output for all the samples are in the 'TCR_mapping' directory and can be accessed using the link below:
Table 4.1.1 CDR3 alignment result for TCR alpha
Freq | Count | CDR3nt | CDR3aa | V_gene | J_gene | V_end_pos | J_start_pos | V_end_del | J_start_del |
---|---|---|---|---|---|---|---|---|---|
1.5179 | 208 | TGTGCTGTTCTTAATGCTGGTGGTACTAGCTATGGAAAGCTGACATTT | CAVLNAGGTSYGKLTF | TRAV21*01 | TRAJ52*01 | 8 | 11 | 4 | 1 |
1.1749 | 161 | TGCCCTAGGAGAGCACTTACTTTT | CPRRALTF | TRAV26-2*01 | TRAJ5*01 | 3 | 6 | 12 | 11 |
0.4014 | 55 | TGTGCTGTGATGGATAGCAACTATCAGTTAATCTGG | CAVMDSNYQLIW | TRAV1-2*01 | TRAJ33*01 | 10 | 10 | 4 | 0 |
0.3868 | 53 | TGTGCTGTAAGCAGAGGCTCAACCCTGGGGAGGCTATACTTT | CAVSRGSTLGRLYF | TRAV36/DV7*01 | TRAJ18*01 | 8 | 11 | 5 | 4 |
0.343 | 47 | TGTGCTGTGCAGGACCTATTAACCAGTGGCTCTAGGTTGACCTTT | CAVQDLLTSGSRLTF | TRAV20*01 | TRAJ58*01 | 13 | 20 | 0 | 7 |
0.2481 | 34 | TGTGCAGCCCCCAATGCTGGTGGTACTAGCTATGGAAAGCTGACATTT | CAAPNAGGTSYGKLTF | TRAV13-1*01 | TRAJ52*01 | 8 | 12 | 5 | 2 |
0.2262 | 31 | TGTGCTTATAGGAGCAGCAGAGATGACAAGATCATCTTT | CAYRSSRDDKIIF | TRAV38-2/DV8*01 | TRAJ30*01 | 15 | 17 | 1 | 4 |
0.2189 | 30 | TGTGCTGTGATGGATAGCAGCTATAAATTGATCTTC | CAVMDSSYKLIF | TRAV1-2*01 | TRAJ12*01 | 10 | 8 | 4 | 1 |
0.2189 | 30 | TGTGCTGTGAGAGATAGCAACTATCAGTTAATCTGG | CAVRDSNYQLIW | TRAV1-2*01 | TRAJ33*01 | 14 | 12 | 0 | 2 |
0.1897 | 26 | TGTGCTGCCATGGATAGCAACTATCAGTTAATCTGG | CAAMDSNYQLIW | TRAV1-2*01 | TRAJ33*01 | 7 | 10 | 7 | 0 |
0.1751 | 24 | TGTGCTTATAGACGTGGCTCTAGGTTGACCTTT | CAYRRGSRLTF | TRAV38-2/DV8*01 | TRAJ58*01 | 11 | 13 | 5 | 12 |
0.1678 | 23 | TGTGCTGTCATGGATAGCAACTATCAGTTAATCTGG | CAVMDSNYQLIW | TRAV1-2*01 | TRAJ33*01 | 8 | 10 | 6 | 0 |
0.1533 | 21 | TGCCTCGTGGGTGGGGCGATGGGAGGTGCTGACGGACTCACCTTT | CLVGGAMGGADGLTF | TRAV4*01 | TRAJ45*01 | 13 | 21 | 3 | 11 |
0.1314 | 18 | TGTGAAAATCGGCGGGGCAACACAGGCAAACTAATCTTT | CENRRGNTGKLIF | TRAV13-2*01 | TRAJ37*01 | 4 | 15 | 9 | 7 |
0.1241 | 17 | TGTGCCGTGGAGAGGATGGATAGCAGCTATAAATTGATCTTC | CAVERMDSSYKLIF | TRAV12-2*01 | TRAJ12*01 | 9 | 13 | 4 | 0 |
0.1241 | 17 | TGTGCAGAGAATTCCCCTACCTCAGGAACCTACAAATACATCTTT | CAENSPTSGTYKYIF | TRAV13-2*01 | TRAJ40*01 | 12 | 16 | 1 | 1 |
0.1168 | 16 | TGTGCTCTGATCGCCCAGGCAGGAACTGCTCTGATCTTT | CALIAQAGTALIF | TRAV9-2*01 | TRAJ15*01 | 10 | 14 | 4 | 4 |
0.1168 | 16 | TGTGCCGTGGAGAGGATGGATAGCAGCTATAAATTGATCTTC | CAVERMDSSYKLIF | TRAV12-2*01 | TRAJ12*01 | 9 | 13 | 4 | 0 |
0.1095 | 15 | TGTGCTGTGCCGCCGGTATCAGGAGGAAGCTACATACCTACATTT | CAVPPVSGGSYIPTF | TRAV20*01 | TRAJ6*01 | 10 | 17 | 3 | 3 |
0.1022 | 14 | TGTGCTGTGAAGGATAGCAACTATCAGTTAATCTGG | CAVKDSNYQLIW | TRAV1-2*01 | TRAJ33*01 | 10 | 11 | 4 | 1 |
Shown is the partial result of one sample (GQ01-E120).Freq: the frequency (percentage) of this clone in the entire population (eg. 1.5 = 1.5%). Count: read count of the clone. CDR3nt: CDR3 nucleotide sequence. CDR3aa: CDR3 amino acid sequence. V_gene: best aligned germline v gene. D_gene: best aligned germline d gene. J_gene: best aligned germline j gene. V_end_pos: the ending position of v gene in the CDR3 nucleotide sequence. D_start_pos: the starting position of d gene in the CDR3 nucleotide sequence. D_end_pos: the edning position of d gene in the CDR3 nucleotide sequence. J_start_pos: the starting position of j gene in the CDR3 nucleotide sequence. V_end_del: the number of nucleotide deleted from 3' end of v gene. D_start_del: the number of nucleotide deleted from 5' end of d gene. D_end_del: the number of nucleotide deleted from 3' end of d gene. J_start_del: the number of nucleotide deleted from 5' end of j gene. "-1": gene sequence not defined.
Table 4.1.2 CDR3 alignment result for TCR beta
Freq | Count | CDR3nt | CDR3aa | V_gene | D_gene | J_gene | V_end_pos | D_start_pos | D_end_pos | J_start_pos | V_end_del | D_start_del | D_end_del | J_start_del |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.777 | 94 | TGTGCCAGCAGTTTATCAAGACAGGGTAGGGGGGGTACGCAGTATTTT | CASSLSRQGRGGTQYF | TRBV27*01 | TRBD1*01 | TRBJ2-3*01 | 17 | 27 | 33 | 35 | 0 | 5 | 1 | 8 |
0.5125 | 62 | TGTGCCAGCAGTCGACAGAGAGGGGGCAATCAGCCCCAGCATTTT | CASSRQRGGNQPQHF | TRBV12-3*01 | TRBD1*01 | TRBJ1-5*01 | 12 | 13 | 18 | 25 | 5 | 2 | 5 | 2 |
0.5042 | 61 | TGTGCCAGCAGTTTAGACGCGACAACAGATACGCAGTATTTT | CASSLDATTDTQYF | TRBV28*01 | . | TRBJ2-3*01 | 15 | -1 | -1 | 24 | 2 | -1 | -1 | 3 |
0.3472 | 42 | TGTGCCAGCAGTTTGAGAGGGGGGGGACTCCGGGAGCAGTACTTC | CASSLRGGGLREQYF | TRBV12-3*01 | TRBD2*01 | TRBJ2-7*01 | 14 | 19 | 26 | 33 | 3 | 9 | 0 | 7 |
0.3224 | 39 | TGTGCCAGCAGTTTTCTCGACGGAGCGGCAGGAGTGGATACGCAGTATTTT | CASSFLDGAAGVDTQYF | TRBV7-8*01 | TRBD2*01 | TRBJ2-3*01 | 14 | 23 | 28 | 36 | 3 | 6 | 5 | 6 |
0.3224 | 39 | TGCGCCAGCAGCCTTCGGACAGGACCAAAGCAGTACTTC | CASSLRTGPKQYF | TRBV5-1*01 | TRBD1*01 | TRBJ2-7*01 | 12 | 16 | 23 | 28 | 4 | 1 | 4 | 8 |
0.2728 | 33 | TGCGCCAGCAGCCAATGGACAGGAATCAATCAGCCCCAGCATTTT | CASSQWTGINQPQHF | TRBV5-1*01 | TRBD1*01 | TRBJ1-5*01 | 12 | 16 | 23 | 26 | 4 | 1 | 4 | 3 |
0.2149 | 26 | TGTGCCACTACGTTGCAGGGGGTGGATGGGGCCAACGTCCTGACTTTC | CATTLQGVDGANVLTF | TRBV6-5*01 | TRBD1*01 | TRBJ2-6*01 | 7 | 15 | 22 | 26 | 10 | 4 | 1 | 3 |
0.2066 | 25 | TGCGCCAGCAGCCAAGATCTAACAGTCGAAAACATTCAGTACTTC | CASSQDLTVENIQYF | TRBV4-3*01 | . | TRBJ2-4*01 | 17 | -1 | -1 | 28 | 0 | -1 | -1 | 5 |
0.1653 | 20 | TGTGCCAGCAGCGCGAAAAACTATGGCTACACCTTC | CASSAKNYGYTF | TRBV27*01 | . | TRBJ1-2*01 | 11 | -1 | -1 | 18 | 6 | -1 | -1 | 2 |
0.1571 | 19 | TGCAGTGCCTTCATGGTGGGGGATGAGCAGTTCTTC | CSAFMVGDEQFF | TRBV20-1*01 | TRBD1*01 | TRBJ2-1*01 | 8 | 17 | 22 | 22 | 6 | 6 | 1 | 8 |
0.1571 | 19 | TGTGCCAGCAGTGAAGTGAGCGGCTCCGGAGGAGATACGCAGTATTTT | CASSEVSGSGGDTQYF | TRBV6-1*01 | TRBD2*01 | TRBJ2-3*01 | 16 | 18 | 23 | 32 | 1 | 6 | 5 | 5 |
0.1488 | 18 | TGCGCCAGCAGCTTGAATTTTCTGTCCGGGAACCCCTACAATGAGCAGTTCTTC | CASSLNFLSGNPYNEQFF | TRBV4-1*01 | . | TRBJ2-1*01 | 12 | -1 | -1 | 34 | 5 | -1 | -1 | 2 |
0.1488 | 18 | TGTGCCAGCAGCTTCCCGAGGTTGGCAGATACGCAGTATTTT | CASSFPRLADTQYF | TRBV11-2*01 | . | TRBJ2-3*01 | 14 | -1 | -1 | 25 | 3 | -1 | -1 | 4 |
0.1405 | 17 | TGCGCCAGCAGCCACTTAGGGGGAGAAGGCTACGAGCAGTACTTC | CASSHLGGEGYEQYF | TRBV5-1*01 | TRBD1*01 | TRBJ2-7*01 | 12 | 17 | 23 | 29 | 4 | 5 | 1 | 3 |
0.124 | 15 | TGCGCCAGCAGCCGCGGATTAGGGACAACAAGCACAGATACGCAGTATTTT | CASSRGLGTTSTDTQYF | TRBV5-1*01 | TRBD1*01 | TRBJ2-3*01 | 12 | 21 | 27 | 30 | 4 | 0 | 6 | 0 |
0.124 | 15 | TGTGCCAGCAGCGCCCACACCGGGGAGCTGTTTTTT | CASSAHTGELFF | TRBV9*01 | . | TRBJ2-2*01 | 13 | -1 | -1 | 16 | 3 | -1 | -1 | 3 |
0.1157 | 14 | TGCAGCGACAGGGAATACAATGAGCAGTTCTTC | CSDREYNEQFF | TRBV29-1*01 | TRBD1*01 | TRBJ2-1*01 | 7 | 6 | 13 | 15 | 7 | 2 | 3 | 4 |
0.1075 | 13 | TGTGCCAGCAGCTTACGGGGGGGACAGGGCGGAGAGACCCAGTACTTC | CASSLRGGQGGETQYF | TRBV7-2*01 | TRBD1*01 | TRBJ2-5*01 | 15 | 16 | 21 | 32 | 2 | 6 | 1 | 4 |
Shown is the partial result of one sample (GQ01-E120).
The outputs for the CDR3 amino acid abundancy analysis for all the samples are in the 'TCR_analysis/CDR3_abundancy' directory and can be accessed using the link below:
Table 4.2.1 CDR3 amino acid abundancy for TCR alpha
Freq | Count | CDR3aa | V_gene | D_gene | J_gene |
---|---|---|---|---|---|
1.9101 | 261 | CAVLNAGGTSYGKLTF | TRAV21*01 | . | TRAJ52*01 |
1.5735 | 215 | CPRRALTF | TRAV26-2*01 | . | TRAJ5*01 |
0.7977 | 109 | CAVMDSNYQLIW | TRAV1-2*01 | . | TRAJ33*01 |
0.6733 | 92 | CAVSRGSTLGRLYF | TRAV36/DV7*01 | . | TRAJ18*01 |
0.4245 | 58 | CAVQDLLTSGSRLTF | TRAV20*01 | . | TRAJ58*01 |
0.3879 | 53 | CAVRDSNYQLIW | TRAV1-2*01 | . | TRAJ33*01 |
0.3074 | 42 | CAAMDSNYQLIW | TRAV1-2*01 | . | TRAJ33*01 |
0.2927 | 40 | CAVMDSSYKLIF | TRAV1-2*01 | . | TRAJ12*01 |
0.2708 | 37 | CAVERMDSSYKLIF | TRAV12-2*01 | . | TRAJ12*01 |
0.2708 | 37 | CAYRSSRDDKIIF | TRAV38-2/DV8*01 | . | TRAJ30*01 |
0.2708 | 37 | CAVLDSNYQLIW | TRAV1-2*01 | . | TRAJ33*01 |
0.2561 | 35 | CAAPNAGGTSYGKLTF | TRAV13-1*01 | . | TRAJ52*01 |
0.2122 | 29 | CAYRRGSRLTF | TRAV38-2/DV8*01 | . | TRAJ58*01 |
0.1903 | 26 | CALIAQAGTALIF | TRAV9-2*01 | . | TRAJ15*01 |
0.1683 | 23 | CAENSPTSGTYKYIF | TRAV13-2*01 | . | TRAJ40*01 |
0.1683 | 23 | CLVGGAMGGADGLTF | TRAV4*01 | . | TRAJ45*01 |
0.161 | 22 | CAYLGNTPLVF | TRAV38-2/DV8*01 | . | TRAJ29*01 |
0.1317 | 18 | CAVKDSNYQLIW | TRAV1-2*01 | . | TRAJ33*01 |
0.1317 | 18 | CENRRGNTGKLIF | TRAV13-2*01 | . | TRAJ37*01 |
Shown is the partial result of one sample (GQ01-E120). Count: read count of the clone. Freq: the frequency (percentage) of this clone in the entire population (eg. 1.5 = 1.5%). CDR3aa: CDR3 amino acid sequence
Table 4.2.2 CDR3 amino acid abundancy for TCR beta
Freq | Count | CDR3aa | V_gene | D_gene | J_gene |
---|---|---|---|---|---|
0.9425 | 114 | CASSLSRQGRGGTQYF | TRBV27*01 | TRBD1*01 | TRBJ2-3*01 |
0.7275 | 88 | CASSLDATTDTQYF | TRBV28*01 | . | TRBJ2-3*01 |
0.6531 | 79 | CASSRQRGGNQPQHF | TRBV12-3*01 | TRBD1*01 | TRBJ1-5*01 |
0.463 | 56 | CASSLRGGGLREQYF | TRBV12-3*01 | TRBD2*01 | TRBJ2-7*01 |
0.3886 | 47 | CASSLRTGPKQYF | TRBV5-1*01 | TRBD1*01 | TRBJ2-7*01 |
0.3803 | 46 | CASSFLDGAAGVDTQYF | TRBV7-8*01 | TRBD2*01 | TRBJ2-3*01 |
0.2728 | 33 | CASSQWTGINQPQHF | TRBV5-1*01 | TRBD1*01 | TRBJ1-5*01 |
0.2728 | 33 | CATTLQGVDGANVLTF | TRBV6-5*01 | TRBD1*01 | TRBJ2-6*01 |
0.2232 | 27 | CASSLNFLSGNPYNEQFF | TRBV4-1*01 | . | TRBJ2-1*01 |
0.2232 | 27 | CASSQDLTVENIQYF | TRBV4-3*01 | . | TRBJ2-4*01 |
0.2067 | 25 | CASSAKNYGYTF | TRBV27*01 | . | TRBJ1-2*01 |
0.1984 | 24 | CSAFMVGDEQFF | TRBV20-1*01 | TRBD1*01 | TRBJ2-1*01 |
0.1901 | 23 | CASSFPRLADTQYF | TRBV11-2*01 | . | TRBJ2-3*01 |
0.1901 | 23 | CASSEVSGSGGDTQYF | TRBV6-1*01 | TRBD2*01 | TRBJ2-3*01 |
0.1736 | 21 | CASSHLGGEGYEQYF | TRBV5-1*01 | TRBD1*01 | TRBJ2-7*01 |
0.1653 | 20 | CASSAHTGELFF | TRBV9*01 | . | TRBJ2-2*01 |
0.1571 | 19 | CASSRGLGTTSTDTQYF | TRBV5-1*01 | TRBD1*01 | TRBJ2-3*01 |
0.1405 | 17 | CASSPVASGRGEQYF | TRBV7-8*01 | TRBD2*01 | TRBJ2-7*01 |
0.124 | 15 | CSDREYNEQFF | TRBV29-1*01 | TRBD1*01 | TRBJ2-1*01 |
Shown is the partial result of one sample (GQ01-E120).
CDR3 amino acid sequences were further extracted and the length distribution was plotted as in Figure 4.3.
Figure 4.3 CDR3 amino acid sequence length distribution.
V(D)J usage of CDR3 were further analyzed. CDR3 sequences were collapsed on V(D)J combination. Figures 4.4.1 and 4.4.2 show the proportion of the top ten combinations in the entire population.
Figure 4.4.1 Top VJ combination usage for TCR alpha.
Figure 4.4.2 Top VDJ combination usage for TCR beta.
To assess germline gene coverage, V gene annotation information was extracted for all the sequences and the total count of each germline gene was calculated and summarized in Table 4.5.1 (TCR alpha) and Table 4.5.2 (TCR beta).
Table 4.5.1 TCR alpha V gene usage
V_gene | GQ01-E120-TRA | GQ02-E120-TRA | GQ03-E120-TRA | GQ04-E120-TRA | GQ05-E120-TRA | GQ06-E120-TRA | GQ07-E120-TRA | GQ08-E120-TRA |
---|---|---|---|---|---|---|---|---|
TRAV1 | 669 | 693 | 648 | 617 | 51 | 508 | 505 | 573 |
TRAV10 | 242 | 241 | 201 | 205 | 13 | 193 | 182 | 157 |
TRAV12 | 952 | 862 | 926 | 803 | 101 | 931 | 1109 | 940 |
TRAV13 | 1807 | 1545 | 1587 | 1483 | 142 | 1260 | 1335 | 1272 |
TRAV14 | 221 | |||||||
TRAV16 | 118 | 109 | 114 | 116 | 10 | 110 | 144 | 128 |
TRAV17 | 356 | 306 | 294 | 290 | 44 | 362 | 405 | 412 |
TRAV19 | 54 | 61 | 55 | 57 | 8 | 165 | 159 | 174 |
TRAV2 | 314 | 327 | 282 | 359 | 54 | 588 | 545 | 595 |
TRAV20 | 826 | 748 | 733 | 729 | 29 | 280 | 349 | 349 |
TRAV21 | 984 | 993 | 993 | 968 | 43 | 497 | 538 | 512 |
TRAV22 | 145 | 123 | 158 | 123 | 13 | 114 | 136 | 131 |
TRAV23 | 178 | |||||||
TRAV24 | 72 | 82 | 65 | 64 | 3 | 68 | 58 | 78 |
TRAV25 | 98 | 105 | 114 | 89 | 18 | 119 | 131 | 127 |
TRAV26 | 969 | 952 | 884 | 902 | 64 | 680 | 590 | 678 |
TRAV27 | 367 | 344 | 317 | 340 | 31 | 279 | 315 | 338 |
TRAV29 | 301 | |||||||
TRAV3 | 96 | 84 | 94 | 87 | 13 | 151 | 115 | 168 |
TRAV30 | 179 | 137 | 161 | 141 | 20 | 151 | 159 | 154 |
TRAV34 | 25 | 28 | 34 | 20 | 7 | 33 | 42 | 30 |
TRAV35 | 215 | 194 | 194 | 170 | 29 | 210 | 257 | 212 |
TRAV36 | 99 | |||||||
TRAV38 | 1904 | 1851 | 1898 | 1766 | 112 | 1057 | 1240 | 1142 |
TRAV39 | 78 | 109 | 88 | 88 | 12 | 87 | 124 | 94 |
TRAV4 | 458 | 410 | 447 | 421 | 27 | 311 | 299 | 285 |
TRAV40 | 30 | 30 | 25 | 21 | 2 | 22 | 33 | 24 |
TRAV41 | 184 | 178 | 155 | 163 | 15 | 210 | 217 | 179 |
TRAV5 | 168 | 168 | 137 | 161 | 21 | 212 | 247 | 200 |
TRAV6 | 83 | 88 | 74 | 71 | 13 | 180 | 191 | 169 |
TRAV8 | 732 | 696 | 720 | 670 | 84 | 918 | 971 | 935 |
TRAV9 | 459 | 478 | 410 | 406 | 61 | 639 | 570 | 674 |
TRAV7 | NA | NA | 2 | NA | NA | NA | NA | NA |
Table 4.5.2 TCR beta V gene usage
V_gene | GQ01-E120-TRB | GQ02-E120-TRB | GQ03-E120-TRB | GQ04-E120-TRB | GQ05-E120-TRB | GQ06-E120-TRB | GQ07-E120-TRB | GQ08-E120-TRB |
---|---|---|---|---|---|---|---|---|
TRBV1 | 1 | 1 | 2 | NA | NA | NA | NA | NA |
TRBV10 | 337 | 289 | 316 | 322 | 352 | 295 | 37 | 278 |
TRBV11 | 246 | 249 | 260 | 291 | 399 | 340 | 40 | 339 |
TRBV12 | 1258 | 1166 | 1187 | 1275 | 999 | 824 | 98 | 821 |
TRBV13 | 24 | 18 | 21 | 20 | 48 | 47 | 2 | 39 |
TRBV14 | 60 | 58 | 63 | 64 | 113 | 101 | 9 | 72 |
TRBV15 | 201 | 170 | 174 | 209 | 249 | 198 | 31 | 194 |
TRBV16 | 3 | 3 | 2 | 3 | 9 | 5 | 2 | 5 |
TRBV18 | 170 | 134 | 171 | 163 | 284 | 204 | 30 | 222 |
TRBV19 | 179 | 129 | 170 | 137 | 228 | 180 | 23 | 197 |
TRBV2 | 156 | 148 | 180 | 168 | 354 | 201 | 28 | 226 |
TRBV20 | 1298 | 1166 | 1263 | 1250 | 1480 | 1250 | 153 | 1285 |
TRBV21 | 29 | 20 | 42 | 20 | 40 | 36 | 2 | 34 |
TRBV23 | 18 | 12 | 11 | 10 | 28 | 34 | 2 | 35 |
TRBV24 | 177 | 143 | 150 | 138 | 196 | 119 | 20 | 112 |
TRBV25 | 70 | 65 | 71 | 63 | 148 | 87 | 10 | 72 |
TRBV27 | 418 | 453 | 419 | 435 | 336 | 264 | 38 | 262 |
TRBV28 | 675 | 591 | 628 | 654 | 593 | 474 | 50 | 422 |
TRBV29 | 805 | 828 | 748 | 792 | 1014 | 959 | 91 | 953 |
TRBV3 | 219 | 176 | 223 | 226 | 407 | 276 | 39 | 282 |
TRBV30 | 157 | 153 | 166 | 169 | 160 | 135 | 12 | 119 |
TRBV4 | 529 | 524 | 526 | 582 | 816 | 700 | 69 | 734 |
TRBV5 | 2079 | 1899 | 2175 | 2185 | 2441 | 2033 | 234 | 1712 |
TRBV6 | 955 | 927 | 958 | 1020 | 1302 | 1037 | 101 | 998 |
TRBV7 | 1828 | 1732 | 1874 | 1931 | 2762 | 2354 | 216 | 2105 |
TRBV9 | 204 | 217 | 223 | 226 | 393 | 343 | 40 | 330 |
TRBV24 | 7 |
The distribution of V gene usage is also illustrated in bar graphs (Figure 4.5.1 and 4.5.2).
Figure 4.5.1 V gene usage of TCR alpha.
Figure 4.5.2 V gene usage of TCR bets.
The clonal type distribution was analysed based on V-J usage and relationship. The results for each sample are shown in Figures 4.6.1 and 4.6.2.
Figure 4.6.1 Chord diagram of TCR alpha
Figure 4.6.2 Chord diagram of TCR beta
To explore relationship across samples, correlation efficiency was calculation based on CDR3 amino acid sequences. Result is shown in Figure 4.7.
Figure 4.7 Sample correlation based on CDR3 amino acid sequences .
The clonal abundance distribution was calculated with confidence intervals derived via bootstrapping. The clonal diversity of the repertoire was accessed using diversity index [6]. Abundance curves are in Figure 4.8.1 and diversity curves are in Figure 4.8.2.
Figure 4.8.1 Abundance curves.
Figure 4.8.2 Diversity curves.
[1] Murphy K. Janeway's Immunobiology. New York: Garland Science (2012).
[2] Bolger, AM. et al., Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. (2014) 30(15): 2114▒~@~S2120.
[3] Lefranc, MP. et al., The international ImMunoGeneTics database. Nucleic Acids Res. (1999) 27 (1): 209-212.
[4] Andrews S. et al., FastQC: a quality control tool for high throughput sequence data. (2010) Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
[5] Ye J. et al., IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. (2013) 41: W34▒~@~SW40.
[6] Hill, M. et al., Diversity and evenness: a unifying notation and its consequences. Ecology (1973) 54:427-432.